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Abstract 

We  obtain  a  formula  for  the  capacity  of  a  binary  timing  channel  with  general  noise  in  terms 
of  the  unique  solution  of  a  channel  dependent  equation.  We  then  give  three  provably  correct 
algorithms  that  can  be  used  to  solve  this  equation. 


1  Introduction 

Shannon  expressed  the  capacity  of  a  discrete  noiseless  channel  with  variable  symbol  time  durations 
as  the  logarithm  of  the  solution  of  a  certain  equation.  In  [1],  this  analysis  was  carried  out  for  a 
certain  type  of  binary  channel  called  a  “Z  channel”  -  a  channel  in  which  one  symbol  is  subjected 
to  noise  but  the  other  is  not.  As  far  as  we  are  aware,  the  timed  Z  channel  of  [1]  is  the  only  case 
in  which  the  capacity  of  a  timed  binary  channel  with  noise  has  been  studied  in  the  literature.  In 
this  note,  we  study  the  capacity  of  a  binary  timed  channel  with  general  noise,  deriving  a  formula 
for  its  capacity  in  terms  of  the  unique  solution  of  an  equation  that  depends  on  the  channel.  The 
equation  is  nonlinear  so  we  give  three  different  numerical  methods  that  can  be  used  to  solve  it  and 
prove  that  each  of  them  converge. 


2  A  formula  for  the  capacity 

Consider  a  channel  through  which  one  of  two  symbols  {oi,  o2)  may  be  sent.  The  output  received  is 
a  symbol  in  {*i,  *2}-  If  »i  is  received,  the  transmission  took  t\  units  of  time;  if  »2  is  received,  it  took 
f2  units  of  time.  The  output  probabilities  y  =  {yi,y2)  are  the  product  of  the  input  probabilities 
x  =  (aq,^)  with  the  matrix  of  conditional  probabilities: 


y  =  x- 


Here  a  =  P(*i|oi),fc  =  P(«2|oi),c  =  P(*i|o2),d  =  P(*2p2),  so  yi  =  (a  -  c) xi  +  c  and  y2  =  1  -  y\. 
To  calculate  the  capacity  we  want  to  maximize 

H(Y)-H(Y\X) 

1  E(T) 
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over  all  possible  x.  This  means  we  want  to  maximize  the  function  It  :  [0, 1]  — »  R  given  by 


H(f(x))  -  xH(a)  -  (1  -  x)H(c) 
t[  J  ti/(*)  +  (l-/(*))t2 

where  H  :  [0, 1]  — ■>  R  is  H (x)  =  —  x  In  x  —  (1  —  x)  In  (1  —  x)  and  /  :  [0, 1]  — >  R  is  f(x)  =  (a  —  c)x  +  c. 
Notice  that  a-\-b  =  c  +  d  =  1.  The  times  t\  and  t2  are  positive  real  numbers.  Recall  that  entropy 
II  is  strictly  convex,  for  x,  y  G  [0, 1]  and  p  €  [0, 1],  we  have 

H(px  +  (1  -p)y)  >  pH{x)  +  (1  -p)H(y) 

with  equality  if  and  only  if  p  =  0,  p  =  1  or  x  =  y. 

Lemma  2.1  The  absolute  maximum  value  of  It  is  always  assumed  at  a  point  in  (0, 1). 

Proof.  If  a  =  c,  then  It  =  0,  and  the  claim  is  trivially  true.  Suppose  now  that  a  and  c  are  different. 
By  strict  convexity  of  H,  we  can  write 

H(f(x))  -  H(xa  +  (1  -  x)c) 
tif(x)  +  (l-f(x))t2 

when  x  €  (0, 1).  Then  It(  1/2)  >  0.  However,  It{ 0)  =  0  =  Jf(l),  so  the  maximum  value  of  It  must 
be  assumed  at  some  point  of  (0, 1).  □ 

Since  the  maximum  of  It  is  assumed  at  a  point  in  the  interior  of  [0, 1],  the  equation  It  =  0  has 
at  least  one  solution. 

Theorem  2.2  If  a  and  c  are  different,  there  is  a  unique  x  e  (0, 1)  where  It  assumes  its  maximum. 
It  is  the  unique  solution  on  [0, 1]  of  the  equation 

(*)  -  (1  -  /(*))*>  =  0 

where  K  =  (ce  —  t2)H(a)  +  (t2  —  ae)H(c )  and  e  :=  t2  —  t\. 

Proof.  At  a  point  x  €  (0, 1)  where  It  takes  its  maximum,  It{x)  —  0.  Let 

p(x)  =  H(f(x))  -  x  ■  H(a)  -  (1  -  x)  ■  H(c). 

Notice  that  t\f{x)  +  ^(1  —  f(x))  =  t2  —  £f(x).  For  the  sake  of  readability,  we  write  Iffx)  as  It, 
f(x)  as  /,  etc.  Then  since  It  =  0, 

(*2  -  e/)p  -  p(-ef)  =  0 


where 

P  =  (ln(l  —  f)  —  ln(/))/  —  H(a)  +  H(c) 

Then  (t 2  —  £f)p  is  equal  to 

t2f  ln(l  -  /)  -  t2f  ln(/)  -  t2H(a )  +  t2H(c )  -  e//ln(l  -  /)  +  effln(f)  +  sfH(a)  -  efH(c) 
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and 


pef  =  - effln{f )  -  ef(  1  -  /)  ln(l  -  /)  -  efxH(a )  -  efH(c)  +  xefH(c) 

When  we  add  these  expressions,  the  first  term  of  pef  cancels  with  the  sixth  term  of  ( t2  —  sf)p,  and 
the  second  term  of  pef  causes  the  first  and  fifth  terms  of  ( 1 2  —  ef)p  to  vanish  leaving  fi/ln(l  —  /). 
Thus,  our  sum  reduces  to 

(~t2f  ln(/)  -  t2H(a)  +  t2H(c)  +  efH(a)  -  efH{cj)  +  (t, /  ln(l  -  /)  -  efxH(a)  -  efH(c)  +  xefH(c 
which  by  /  =  fx  +  c  and  properties  of  logarithms  is 

/In  ft}  )  +  ( CE  ~  *2)# (a)  +  {t2  ~  ae)H(c) 

Thus,  x  is  a  zero  of 

g(x )  =  r))t2  -  (1  -  /(x))*1 

Now  suppose  g  had  two  distinct  zeroes.  Then  its  derivative  would  have  to  be  zero  at  some  point 
in  between.  But 

9  =  f  +  tj(l  -  /(x))^"1) 

which  is  never  zero  as  the  product  of  f  =  a  —  c^O  and  a  positive  number.  □ 

To  reassure  ourselves  that  the  equation  above  is  valid,  let  us  consider  a  few  special  cases  of  it. 

In  [1],  for  the  timed  Z  channel,  c  =  0,  so  K  =  —t2H(a),  f(x )  =  ax  and  we  get 

et2tf(a)/a(ax)t2  _  _  ax)ti  =  0 

This  equation  easily  follows  from  the  following  equation  encountered  in  [1]: 

(I) H{a) = *■ log  (—/ pr)  - e  log(/(:c)) 

Another  case  is  the  untimed  case:  if  e  =  0,  then  K  =  t2(H(c)  —  H(a)),  so  our  equation  is  now 

et2(H(a)-H(c))/(a- c)(/(x))*2  _  (1  _  /(x))tl  =  0 

Taking  logs  of  both  sides  and  simplifying  yields 

(1  -f(x)\  H(a)  —  H(c) 

V  fix)  ) 

which  is  the  equation  we  have  to  solve  to  calculate  the  capacity  in  the  untimed  case.  Notice  that 
the  dependence  on  time  is  eliminated  when  e  —  0  regardless  of  the  value  of  t2  =  t\.  Finally,  in  the 
case  of  a  binary  symmetric  channel ,  where  the  probability  of  a  bit  flip  is  p,  we  have  a  =  1  —  p  =  d 
and  b  =  p  =  c,  so  our  equation  takes  the  form 

ul 2  -  {k  -  u)u  =  0 

where  u  =  f(x)eH^  and  k  =  eH(p\ 
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Theorem  2.3  Let  x  6  (0, 1)  be  the  solution  of  { *)  for  a  timing  channel  with  two  symbols  and  a  ^  c. 
Then  its  capacity,  measured  in  bits  per  unit  time,  is 


J_  (H(a)(c-l)  +  H(c)(l-a)  \  1 

ln(2)  V  (a-c)  )  h 

Proof.  Going  backward  from  (*),  we  see  that  x  satisfies 

-77  +  J-  =  1«(1  -  /(*))• 

*l/ 

Substituting  this  into  /*,  abbreviating  /(x)  as  /,  we  have 

r,  x  — / ln(/)  —  (1  —  /)  ln(l  —  /)  —  xH(a)  —  (1  —  x)H(c) 

It{x)  =  - - 

-fHf)  -(!-/)  (~5  +  §ln(/))  ~  *#(«)  -  (1  -  z)tf(c) 

t2-ef 

-tiffln(f)  -  (1  -  /)  +  t2/ln(/)j  -  tifxH(a)  -  fi/(l  -  x)H{c) 

hf(t2-ef) 

Now  using  e  +  tj  =  t2,  (/  ~  c)  =  /x,  we  get 

-tiffln(f)  -  (1  -  /)  (-K  +  (<r  +  ti)/in(/))  -  <i(/  -  c)tf(a)  -  *i(/  -  (/  -  c))tf(c) 
Jt(x)  =  - ^ - 

<l/(<2  -  £/) 

(1  -  /)/f  +  fHm-hf  -  (1  -  /)(£  4-  fi))  -<!(/-  c)ff(a)  -  t!(fl  -  f)H(c) 

tif(t2  -  ef) 

(1  -  f)K  +  /!»(/)(£/  -  t2)  -  tt(/  -  c)H(a)  -  t^a  -  /)ff(c) 

*i/(*2  -  e/) 

Now  we  focus  on  the  expression  (1  —  f)K  —  t\{f  —  c)H(a)  —  t\(a  —  f)H(c).  It  equals 
H(a)((  1  -  f)(cs  -  t2)  -ti (/  -  c))  +  ff(c)((l  -  /)(i2  -  ae)  -  ti(a  -  /)) 

which  is 

H{a)(c  -  1  )(t2  -  ef)  +  H(c)(  1  -  a)(f2  -  e/) 

Putting  everything  together,  we  get 


It{x)  = 


(*2  ~  ef){H{a){c-  1)  +  ff(c)(l  -  a)  -  /ln(/)) 
tif{t2-ef) 

H(a)(c-l)  +  H(c)(l-a)  In  (/) 

(a  —  c)tj  ti 


Finally,  because  capacity  is  measured  in  bits,  we  convert  our  logarithms  to  base  2  by  multiplying 
by  1/ ln(2).  □ 


Given  that  the  capacity  calculation  depends  entirely  on  our  ability  to  compute  the  solution  of 
(*),  we  now  turn  to  methods  for  calculating  it  which  are  provably  correct. 


4 


3  Algorithms  for  calculating  the  capacity 


We  now  consider  methods  for  calculating  the  unique  solution  of  g(x)  =  0.  First  notice  that  it  is 
enough  to  solve  the  equation  h(u )  =  0  where 

h(u)  =  ut2  —  (1  —  u)tl 


and  then  obtain  x  =  /  1(u).  One  way  to  solve  h(u)  =  0  is  to  use  the  bisection  method  since  h 
changes  sign  on  [0, 1]  (i.e.  h( 0)  =  — 1  <  0  and  h(  1)  >  0).  Here  is  a  one  point  method. 

Theorem  3.1  Let  </>  :  [0, 1]  — >  M  be  the  map 


<f>(x) 


=  x  — 


h(x) 

~M~ 


where  the  constant  M  is  given  by 

M  =  t<ie~K^  +  t\ 

For  any  x  £  [0, 1],  the  sequence  (<f>n( x))  converges  to  the  unique  zero  r  of  h  on  [0, 1], 


Proof.  First,  we  claim  that  <j>{x)  £  (0, 1)  for  all  x  £  [0, 1].  Since  </>  =  1  —  h/M,  all  we  have  to  show 
is  that  0  <  h(u)  <  M  for  u  £  [0, 1],  First 

h{u)  =  e~K^t2 ut2  +  fi(l  —  u)ll~l  >  0 


if  0  <  u  <  1  while  h(u)  =  t\  >  0  for  u  =  0.  So  h  >  0  on  [0, 1].  Next  we  see 


h( 0)  =  t\,  h(u )  <  M  for  u  £  (0, 1),  h(  1)  <  t^e  K ^ 


so  that  h  <  M  on  [0, 1],  Let 


C(j)  :=  sup  <j>(x) 

x6[0,l] 


By  the  continuity  of  0,  there  is  z  £  [0, 1]  such  that  <j>(z)  =  c $  and  so  0  <  <  1.  Now  if  we  are 

given  distinct  points  x  <  y  £  [0,1],  then  by  the  mean  value  theorem,  there  is  some  p  £  (x,  y)  with 


100)  -  00)1  =  100)1  -  \x-y\  =  00)0  -  y\ 


which  means  |0O)  ~  00)1  <  c^\x  —  y\.  If  r  is  the  unique  zero  of  h  and  x  £  [0, 1]  is  any  other  point, 

\<j>n(x)-r\  <  c^\x  —  r| 
which  implies  fn(x)  — >  r  since  <  1.  □ 


The  bisection  is  a  “bracketing  method”  -  one  of  its  advantages  is  that  we  always  have  some 
idea  of  how  close  we  are  to  the  zero  since  we  carry  both  an  upper  and  a  lower  bound,  while  a 
potential  advantage  of  the  one  point  method  above  is  that  a  smaller  upper  bound  M  on  h  will 
improve  its  convergence  (though  useful  estimates  of  its  rate  of  convergence  are  unknown  to  us). 
Our  next  method  originates  from  [2],  It  attempts  to  combine  the  advantages  of  both  the  bisection 
and  the  one  point  method  in  the  last  theorem. 
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Definition  3.2  Let  IIR  —  {[a, b]  :  a  <  b  &  a,b  e  R}  ordered  by  reverse  inclusion  C.  We  use  the 
following  operators  on  IK: 

•  l  :  UR  — >  K  ::  [a,  fe]  i— >  a 

•  m  :  IK  — >  K  ::  [a,  6]  i— >  (a  +  b)/2 

•  r  :  IK  — »  R  ::  [a,  fr]  i-»  b 

These  are  abbreviated  lx  :=  l(x),  rx  :=  r(x )  and  mx  :=  m(x).  Let 

Oh  :=  {x  €  IK  :  [0,  1]CiC  [r]} 
denote  the  set  of  intervals  where  h  changes  sign. 

For  instance,  the  bisection  method  defines  a  function  splits  :  Oh  — >  Oh  given  by 


sPlith  (s) 


_  f  [lx,mx]  i 

1  [mx,rx]  c 


if  h(mx )  >  0; 
otherwise. 


If  we  iterate  this  function  beginning,  say,  from  x  =  [0, 1],  then  (split^(x))  is  a  decreasing  sequence 
of  intervals  which  contain  r  and  whose  lengths  tend  to  zero. 


Theorem  3.3  Iterating  the  mapping  s ^  :  Oh  — >  Oh  given  by 

f  [lx,mx  -  ( h(mx)/M )]  if  h(mx)  >  0; 


Sh(  x) 


{  [mx  +  (\h(rrix)\/M)  ,rx]  otherwise; 


is  an  algorithm  for  calculating  r.  That  is,  it  produces  a  decreasing  sequence  of  intervals  which 
contain  r  and  whose  lengths  tend  to  zero: 

U  “EM  =  H. 


n>  0 


for  all  x  6  Oh.  Thus,  for  all  x  €  Oh,  if  xn  €  s%(x)  for  each  n,  then  xn 


r. 


This  result  is  a  special  case  of  a  more  general  result  given  in  [2].  Its  proof  uses  the  fact  that  h 
has  a  unique  zero  r  with  h  >  0  to  the  right  of  r  and  h  <  0  to  the  left  of  r.  By  incorporating  an 
upper  bound  of  h,  this  method  outdoes  the  bisection  method  at  every  iteration.  While  this  method 
does  use  information  about  the  first  derivative  in  the  computation  (the  upper  bound  M),  it  does 
not  use  the  derivative  itself  as  part  of  the  computation.  For  this  reason,  the  general  version  of  the 
algorithm  in  [2],  which  applies  to  the  class  of  Holder  continuous  functions,  is  capable  of  beating 
the  bisection  at  every  iteration  without  requiring  any  differentiability.  The  nowhere  differentiable 
function  introduced  in  1872  by  Weierstrass  is  a  well-known  example  of  this  type. 
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